LWS-Det: Layer-Wise Search for 1-bit Detectors
167
(a)
(c)
(b)
FIGURE 6.9
Example layer-wise feature map distribution and detection results of (a) a real-valued detec-
tor, (b) LWS-Det, and (c) BiDet. We extract the feature maps of the first, second, and final
binarized layers and illustrate their distributions based on the frequency-value histogram in
rows 1–3. The last row shows the detection result.
Figure 6.9 shows the layer-wise feature map distribution and detection results of a real-
valued detector, our LWS-Det, and BiDet [240] from left to right. The first three rows show
the distributions of feature maps. The distribution of BiDet’s feature map has a variance
less similar to the one of the real-value detector, leading to a result with false positives and
missed detection in the 4-th row. In comparison, our LWS-Det can reduce the binarization
error and provide better detection results.
In this section, we present the layer-wise search method to produce an optimized 1-bit
detector (LWS-Det) [264] using the student-teacher framework to narrow the performance
gap. As shown in Fig. 6.10, we minimize the binarization error by decoupling it into angular
and amplitude errors. We search for binarized weight supervised by well-designed losses be-
tween real-valued convolution and 1-bit convolution under differentiable binarization search
(DBS) framework, following the DARTS method [151, 305]. We formulate the binarization
problem as the combination of −1 and 1, while a differentiable search can explore the binary
space to significantly improve the capacity of 1-bit detectors. To improve the representation
ability of LWS-Det, we design two losses to supervise the 1-bit convolution layer from angu-
lar and amplitude perspective. In this way, we obtain a powerful 1-bit detector (LWS-Det)
that can minimize angular and amplitude errors in the same framework.
6.4.1
Preliminaries
Given a conventional CNN model, we denote wi ∈Rni and ai ∈Rmi as its weights and
feature maps in the i-th layer, where ni = Ci · Ci−1 · Ki · Ki and mi = Ci · Wi · Hi. Ci
represents the number of output channels of the i-th layer. (Wi, Hi) are the width and
height of the feature maps and Ki is the kernel size. Then we have the following.
ai = ai−1 ⊗wi,
(6.65)